{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Chapter 11",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/jo-cho/advances_in_financial_machine_learning/blob/master/Chapter_11.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8pN6UDGM-0PV",
        "colab_type": "text"
      },
      "source": [
        "“Seven Sins of Quantitative Investing” (Luo et al. [2014])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c631MA2a-1Ls",
        "colab_type": "text"
      },
      "source": [
        "- 1. **Survivorship bias**: Using as investment universe the current one, hence ignoring that some companies went bankrupt and securities were delisted along the way."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rbUfjTuH-5Xn",
        "colab_type": "text"
      },
      "source": [
        "- 2. **Look-ahead bias**: Using information that was not public at the moment the simulated decision would have been made. Be certain about the timestamp for\n",
        "each data point. Take into account release dates, distribution delays, and backfill corrections."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OELcF7jy_RhI",
        "colab_type": "text"
      },
      "source": [
        "- 3. **Storytelling**: Making up a story ex-post to justify some random pattern."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ypJGnvpm_Y0H",
        "colab_type": "text"
      },
      "source": [
        "- 4. **Data mining and data snooping**: Training the model on the testing\n",
        "set."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0RAZiYbP_fpO",
        "colab_type": "text"
      },
      "source": [
        "- 5. **Transaction costs**: Simulating transaction costs is hard because the only way to be certain about that cost would have been to interact with the trading book (i.e., to do the actual trade)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X-jhurJ-_kd6",
        "colab_type": "text"
      },
      "source": [
        "- 6. **Outliers:** Basing a strategy on a few extreme outcomes that may never happen again as observed in the past."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vAzdAIyi_tSB",
        "colab_type": "text"
      },
      "source": [
        "- 7. **Shorting**: Taking a short position on cash products requires finding a lender. The cost of lending and the amount available is generally unknown, and\n",
        "depends on relations, inventory, relative demand, etc."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AoAPA3GkGZpv",
        "colab_type": "text"
      },
      "source": [
        "- 0. Other common errors include computing performance using a non-standard\n",
        "method (Chapter 14); ignoring hidden risks; focusing only on returns while ignoring\n",
        "other metrics; confusing correlation with causation; selecting an unrepresentative\n",
        "time period; failing to expect the unexpected; ignoring the existence of stop-out limits\n",
        "or margin calls; ignoring funding costs; and forgetting practical aspects"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Imz1ddN3_4Dw",
        "colab_type": "text"
      },
      "source": [
        "A few general recommendations\n",
        "\n",
        "1. Develop models for entire asset classes or investment universes, rather than\n",
        "for specific securities (Chapter 8). Investors diversify, hence they do not make\n",
        "mistake X only on security Y. If you find mistake X only on security Y, nomatter\n",
        "how apparently profitable, it is likely a false discovery.\n",
        "2. Apply bagging (Chapter 6) as a means to both prevent overfitting and reduce\n",
        "the variance of the forecasting error. If bagging deteriorates the performance of\n",
        "a strategy, it was likely overfit to a small number of observations or outliers.\n",
        "3. Do not backtest until all your research is complete (Chapters 1–10).\n",
        "4. Record every backtest conducted on a dataset so that the probability of backtest\n",
        "overfitting may be estimated on the final selected result (see Bailey, Borwein,\n",
        "L´opez de Prado, and Zhu [2017a] and Chapter 14), and the Sharpe ratio may\n",
        "be properly deflated by the number of trials carried out (Bailey and L´opez de\n",
        "Prado [2014b]).\n",
        "5. Simulate scenarios rather than history (Chapter 12). A standard backtest is a\n",
        "historical simulation, which can be easily overfit. History is just the random\n",
        "path that was realized, and it could have been entirely different. Your strategy\n",
        "should be profitable under a wide range of scenarios, not just the anecdotal\n",
        "historical path. It is harder to overfit the outcome of thousands of “what if”\n",
        "scenarios.\n",
        "6. If the backtest fails to identify a profitable strategy, start from scratch. Resist\n",
        "the temptation of reusing those results. Follow the Second Law of Backtesting."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BBP59j1OAFdx",
        "colab_type": "text"
      },
      "source": [
        " - second law of backtesting: “Backtesting while researching is like drinking and driving.\n",
        "Do not research under the influence of a backtest.”"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "V9cOEOC8AtMt",
        "colab_type": "text"
      },
      "source": [
        "Exercises"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O66NygXlAxU_",
        "colab_type": "text"
      },
      "source": [
        "## 1. \n",
        "An analyst fits an RF classifier where some of the features include seasonally adjusted employment data. He aligns with January data the seasonally adjusted value of January, etc. What “sin” has he committed?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "foJk7tqAEe1F",
        "colab_type": "text"
      },
      "source": [
        "a. lookahead bias"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6MJ7j3MMEgxp",
        "colab_type": "text"
      },
      "source": [
        "##2.\n",
        "An analyst develops an ML algorithm where he generates a signal using closing\n",
        "prices, and executed at close. What’s the sin?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_yMYASTBEjDq",
        "colab_type": "text"
      },
      "source": [
        "a. lookahead bias, transaction costs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6qEsGCCYA2Qe",
        "colab_type": "text"
      },
      "source": [
        "##3. \n",
        "There is a 98.51% correlation between total revenue generated by arcades and\n",
        "computer science doctorates awarded in the United States. As the number of\n",
        "doctorates is expected to grow, should we invest in arcades companies? If not,\n",
        "what’s the sin?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SEHFfsSWFE30",
        "colab_type": "text"
      },
      "source": [
        "a. stroytelling"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IeB2YwoIFXqK",
        "colab_type": "text"
      },
      "source": [
        "## 4.\n",
        "The Wall Street Journal has reported that September is the only month of the\n",
        "year that has negative average stock returns, looking back 20, 50, and 100 years.\n",
        "Should we sell stocks at the end of August? If not, what’s the sin?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7w5usXP7FamO",
        "colab_type": "text"
      },
      "source": [
        "a. storytelling"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JLXpVmBOFr-c",
        "colab_type": "text"
      },
      "source": [
        "##5.\n",
        "We download P/E ratios from Bloomberg, rank stocks every month, sell the top\n",
        "quartile, and buy the long quartile. Performance is amazing. What’s the sin?"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-pp6VS4CFyay",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}